NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Trinity IV: predictions for supermassive black holes at z ≳ 6

https://doi.org/10.1093/mnras/stae1447

Zhang, Haowen; Behroozi, Peter; Volonteri, Marta; Silk, Joseph; Fan, Xiaohui; Aird, James; Yang, Jinyi; Wang, Feige; Tee, Wei Leong; Hopkins, Philip F (June 2024, Monthly Notices of the Royal Astronomical Society)

ABSTRACT We present predictions for the high-redshift halo–galaxy–supermassive black hole (SMBH) connection from the Trinity model. Matching a comprehensive compilation of galaxy (0 ≤ z ≤ 13) and SMBH data sets (0 ≤ z ≤ 6.5), Trinity finds: (1) The number of SMBHs with M• > 109 M⊙ in the observable Universe increases by five orders of magnitude from z ∼ 10 to z ∼ 2, and by another factor of ∼3 from z ∼ 2 to z = 0; (2) The M• > 109 and 1010 M⊙ SMBHs at z ∼ 6 live in haloes with ∼(2 − 3) and (3 − 5) × 1012 M⊙; (3) the newly discovered JWST AGN candidates at 7 ≲ z ≲ 11 are overmassive compared to the intrinsic SMBH mass–galaxy mass relation from Trinity, but they are still broadly consistent with Trinity predictions for flux limited AGN samples with Lauer bias. This bias favours the detection for overmassive SMBHs due to higher luminosities at a fixed Eddington ratio. However UHZ1’s M•/M* ratio is still some 1 dex higher than Trinity AGNs, indicating a discrepancy; (4) Trinity underpredicts the number densities of GN-z11 and CEERS_1019 analogues. But given the strong constraints from existing data in Trinity, the extra constraint from GN-z11 and CEERS_1019 does not significantly change trinity model results. (5) z = 6–10 quasar luminosity functions will reduce uncertainties in the trinity prediction of the z = 6–10 SMBH mass–galaxy mass relation by up to ∼0.5 dex. These luminosity functions will be available with future telescopes, such as Roman and Euclid.
more » « less
Full Text Available
A post-starburst pathway for the formation of massive galaxies and black holes at z > 6

https://doi.org/10.1038/s41550-025-02628-1

Onoue, Masafusa; Ding, Xuheng; Silverman, John D; Matsuoka, Yoshiki; Izumi, Takuma; Strauss, Michael A; Ward, Charlotte; Phillips, Camryn L; Ito, Kei; Andika, Irham T; et al (August 2025, Nature Astronomy)

Free, publicly-accessible full text available August 11, 2026
Trinity II: The luminosity-dependent bias of the supermassive black hole mass–galaxy mass relation for bright quasars at z = 6

https://doi.org/10.1093/mnrasl/slad060

Zhang, Haowen; Behroozi, Peter; Volonteri, Marta; Silk, Joseph; Fan, Xiaohui; Aird, James; Yang, Jinyi; Hopkins, Philip F (April 2023, Monthly Notices of the Royal Astronomical Society: Letters)

ABSTRACT Using recent empirical constraints on the dark matter halo–galaxy–supermassive black hole (SMBH) connection from z = 0–7, we infer how undermassive, typical, and overmassive SMBHs contribute to the quasar luminosity function (QLF) at z = 6. We find that beyond Lbol = 5 × 1046 erg s−1, the z = 6 QLF is dominated by SMBHs that are at least 0.3 dex above the z = 6 median M•–M* relation. The QLF is dominated by typical SMBHs (i.e. within ±0.3 dex around the M•–M* relation) at Lbol ≲ 1045 erg s−1. At z ∼ 6, the intrinsic M•–M* relation for all SMBHs is slightly steeper than the z = 0 scaling, with a similar normalization at $$M_* \sim 10^{11} \, \mathrm{M}_\odot$$. We also predict the M•–M* relation for z = 6 bright quasars selected by different bolometric luminosity thresholds, finding very good agreement with observations. For quasars with Lbol > 3 × 1046 (1048) erg s−1, the scaling relation is shifted upwards by ∼0.35 (1.0) dex for 1011M⊙ galaxies. To accurately measure the intrinsic M•–M* relation, it is essential to include fainter quasars with Lbol ≲ 1045 erg s−1. At high redshifts, low-luminosity quasars are thus the best targets for understanding typical formation paths for SMBHs in galaxies.
more » « less
Full Text Available
A SPectroscopic Survey of Biased Halos In the Reionization Era (ASPIRE): Broad-line AGN at z = 4−5 Revealed by JWST/NIRCam WFSS

https://doi.org/10.3847/1538-4357/ad6565

Lin, Xiaojing; Wang, Feige; Fan, Xiaohui; Cai, Zheng; Champagne, Jaclyn B; Sun, Fengwu; Volonteri, Marta; Yang, Jinyi; Hennawi, Joseph F; Bañados, Eduardo; et al (October 2024, The Astrophysical Journal)

Abstract Low-luminosity active galactic nuclei (AGNs) with low-mass black holes (BHs) in the early universe are fundamental to understanding the BH growth and their coevolution with the host galaxies. Utilizing JWST NIRCam Wide Field Slitless Spectroscopy, we perform a systematic search for broad-line Hαemitters (BHAEs) atz≈ 4–5 in 25 fields of the A SPectroscopic survey of biased halos In the Reionization Era (ASPIRE) project, covering a total area of 275 arcmin². We identify 16 BHAEs with FWHM of the broad components spanning from ∼1000 to 3000 km s⁻¹. Assuming that the broad line widths arise as a result of Doppler broadening around BHs, the implied BH masses range from 10⁷to 10⁸M_⊙, with broad Hα-converted bolometric luminosities of 10^44.5–10^45.5erg s⁻¹and Eddington ratios of 0.07–0.47. The spatially extended structure of the F200W stacked image may trace the stellar light from the host galaxies. The Hαluminosity function indicates an increasing AGN fraction toward the higher Hαluminosities. We find possible evidence for clustering of BHAEs: two sources are at the same redshift with a projected separation of 519 kpc; one BHAE appears as a composite system residing in an overdense region with three close companion Hαemitters. Three BHAEs exhibit blueshifted absorption troughs indicative of the presence of high column density gas. We find that the broad-line-selected and photometrically selected BHAE samples exhibit different distributions in the optical continuum slopes, which can be attributed to their different selection methods. The ASPIRE broad-line Hαsample provides a good database for future studies of faint AGN populations at high redshift.
more » « less
Full Text Available
A comprehensive evaluation of long read error correction methods

https://doi.org/10.1186/s12864-020-07227-0

Zhang, Haowen; Jain, Chirag; Aluru, Srinivas (December 2020, BMC Genomics)
null (Ed.)
Abstract Background Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research. Conclusions Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .
more » « less
Full Text Available
Real-time mapping of nanopore raw signals

https://doi.org/10.1093/bioinformatics/btab264

Zhang, Haowen; Li, Haoran; Jain, Chirag; Cheng, Haoyu; Au, Kin Fai; Li, Heng; Aluru, Srinivas (July 2021, Bioinformatics)

Abstract Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. Results In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). Availability and implementation Sigmap code is accessible at https://github.com/haowenz/sigmap. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
On the Complexity of Sequence-to-Graph Alignment

https://doi.org/10.1089/cmb.2019.0066

Jain, Chirag; Zhang, Haowen; Gao, Yu; Aluru, Srinivas (April 2020, Journal of Computational Biology)

Full Text Available
Validating Paired-end Read Alignments in Sequence Graphs

https://doi.org/10.1101/682799

Jain, Chirag; Zhang, Haowen; Dilthey, Alexander; Aluru, Srinivas (September 2019, Leibniz international proceedings in informatics)

Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.
more » « less
Full Text Available
Labeled graph sketches: Keeping up with real-time graph streams

https://doi.org/10.1016/j.ins.2019.07.019

Song, Chunyao; Ge, Tingjian; Ge, Yao; Zhang, Haowen; Yuan, Xiaojie (November 2019, Information Sciences)

Full Text Available
Accelerating Sequence Alignment to Graphs

https://doi.org/10.1109/IPDPS.2019.00055

Jain, Chirag; Misra, Sanchit; Zhang, Haowen; Dilthey, Alexander; Aluru, Srinivas (September 2019, IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Aligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.
more » « less
Full Text Available

« Prev Next »

Search for: All records